Skip to content

Conversation

@Realtyxxx
Copy link

Directly, I want to solve the problem of tptr dialect drop, mlir-opt can't recognize tptr dialect, so I used pass manager;
Furthermore;
I refer to triton-cpu and triton-linalg to handle the conversion of unstructured ptr types to llvm;
I used nhat's code case to test, verified its correctness, and placed it in the test directory

Path(ttshared_path).write_text(ttsharedir)
mlir_opt_path = _get_llvm_bin_path("mlir-opt")
# TritonShared-MLIR to LLVM-MLIR
subprocess.check_call([mlir_opt_path, ttshared_path,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we preserve all the original comments since they explain why we need certain passes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not resolved, there are other comments here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't understand your meaning well before, I will add it back

@red1bluelost
Copy link
Contributor

red1bluelost commented Aug 18, 2025

I like the idea of this change but for it to be viable we need to limit the scope of its impact. Anything associated with TPTR is temporary until all the necessary features are in upstream ptr dialect.

Please revert the edits that removed conversions from TT/TTS IR to ptr/tptr IR.

An ideal situation would be just adding the TptrToLLVM pass that operates purely on upstream dialects plus tptr IR (no TT or TTS IR). Then have the pass manager you wrote instead of the opt invocation.

@nhat-nguyen Do you think we'd want to keep around the old opt invocation for when we can use all upstream ptr dialect features or would it be fine to fully move to the pass manager approach here?

@Realtyxxx
Copy link
Author

I understand you want to keep the modifications within MLIR. The tptr dialect seems to be waiting for the full completion of MLIR's PtrDialect (I tried to modify tptr to use MLIR's ptr dialect, but mlir-opt does not yet recognize the ptr dialect, and the ptr dialect currently does not provide the ability to convert with other dialects). This method was used considering various reasons (given my extremely limited capabilities and perspective on MLIR and LLVM), and I welcome everyone's feedback

@red1bluelost
Copy link
Contributor

The MLIR opt tool should recognize ptr dialect but is not expected to recognize tptr. The pass manager approach should recognize ptr dialect and if it does not then we need to register the ptr dialect somewhere. We do not need to necessarily use ptr dialect operations just yet, but we want to keep using ptr.ptr type.

@Realtyxxx
Copy link
Author

Yes, using pass manager approach, we could register the ptr dialect to its' context.
And I wanna to know that is it acceptable to use llvm dialect in tt_shared IR here?

@red1bluelost
Copy link
Contributor

Looking at the general Triton+TritonShared+CPU pipeline. There are three stages:

  1. Triton Frontend
  2. Triton Shared middle conversions
  3. MLIR backend, CPU as an example here but companies likely have a different backend

We should not add LLVM dialect to stage 1 or 2. This is mainly because it is a very low level dialect and might not mesh as well during the early stages of the stage 3 MLIR backends. Your TptrToLLVM pass would be running somewhere near the end of stage 3 so is fine to generate LLVM dialect.

@Realtyxxx
Copy link
Author

I am still getting familiar with mlir and llvm.
So initially, I chose to insert this pass at the compilation error location (before convert-cf-to-llvm, the cf parameters here are not accepted)
Thanks for your advice, I would try to change the passes' location.
Besides that, are you interested in introducing Pass Manager?

@red1bluelost
Copy link
Contributor

If we can add this TPtrToLLVM conversion then the pass manager is worth it. This should hopefully enable dynamic pointer end-to-end compilation for the CPU backend. If we cannot support TPtrToLLVM conversion, then the pass manager does not have much benefit.

@Realtyxxx
Copy link
Author

Understandable

@Realtyxxx
Copy link
Author

Realtyxxx commented Aug 19, 2025

Looking at the general Triton+TritonShared+CPU pipeline. There are three stages:

  1. Triton Frontend
  2. Triton Shared middle conversions
  3. MLIR backend, CPU as an example here but companies likely have a different backend

We should not add LLVM dialect to stage 1 or 2. This is mainly because it is a very low level dialect and might not mesh as well during the early stages of the stage 3 MLIR backends. Your TptrToLLVM pass would be running somewhere near the end of stage 3 so is fine to generate LLVM dialect.

Ensure we all understand the current plan. Now I am lowering the tptr to llvm at the end of stage 3:

image

So, I wonder if it would be ok for changing tptr to using ptr.ptr type? and not changing the location of pass ?

Or do you mean I need to change the position of this pass? If it's about changing the pass position, can you give me an opinion on the pass position?

@red1bluelost

@red1bluelost
Copy link
Contributor

red1bluelost commented Aug 19, 2025

I think the current position is good. If it helps to move it 1-3 earlier or later then that is also fine but try to keep it around the other *_to_llvm passes.

Any tptr operation should be operation on integers, ptr.ptr types, and/or memref types. The changes to TritonToLinalgExperimental passes should be reverted so that Triton and TritonStructured operations and types do not reach stage 3.

@Realtyxxx
Copy link
Author

Understood, I will implement it

@Realtyxxx
Copy link
Author

Hi @red1bluelost I have new changes, Please review again

@Realtyxxx Realtyxxx requested a review from red1bluelost August 22, 2025 10:25
@Realtyxxx Realtyxxx force-pushed the tyx/tptr-to-llvm branch 2 times, most recently from d9fb190 to 9340b85 Compare August 25, 2025 03:55
Copy link
Contributor

@red1bluelost red1bluelost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New set of review comments.

// Option<"typeoffsetToConst", "typeoffset-to-const", "bool", /*default*/"true",
// "Convert tptr.typeoffset to llvm.mlir.const">,
];
let dependentDialects = ["mlir::tptr::TPtrDialect", "mlir::LLVM::LLVMDialect"];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect we will want ptr::PtrDialect included here eventually if not now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I previously tried using ptr.ptrtype, and I think when we use ptrdialect, we need to wait until Mr fabianmcg merges most of the ptr-related changes into LLVM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add ptr::PtrDialect to the list of 'dependentDialects' here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously I thought we only used ptrType so we didn't add PtrDialect, but of course we could

Realtyxxx

This comment was marked as duplicate.

Realtyxxx

This comment was marked as duplicate.

@Realtyxxx
Copy link
Author

It looks like there's an issue with the GitHub tool or with my push. I've resolved all the conversations, please take another look
@red1bluelost

@red1bluelost
Copy link
Contributor

Please do not resolve my comments. I will resolve them as I go through a re-review.

Copy link
Contributor

@red1bluelost red1bluelost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase your changes onto an updated Triton Shared.

CMakeLists.txt Outdated

Python3::Module
pybind11::headers
${Python3_LIBRARIES}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove Python3_LIBRARIES

Path(ttshared_path).write_text(ttsharedir)
mlir_opt_path = _get_llvm_bin_path("mlir-opt")
# TritonShared-MLIR to LLVM-MLIR
subprocess.check_call([mlir_opt_path, ttshared_path,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not resolved, there are other comments here.

// Option<"typeoffsetToConst", "typeoffset-to-const", "bool", /*default*/"true",
// "Convert tptr.typeoffset to llvm.mlir.const">,
];
let dependentDialects = ["mlir::tptr::TPtrDialect", "mlir::LLVM::LLVMDialect"];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add ptr::PtrDialect to the list of 'dependentDialects' here.

@@ -1,17 +1,18 @@
#include "mlir/Interfaces/SideEffectInterfaces.h" // Required for IR/TPtrOps.h.inc
#include "mlir/Bytecode/BytecodeOpInterface.h"
#include "mlir/Interfaces/SideEffectInterfaces.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove all edits from this file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have replaced it back to the original file style. Just to confirm, do I need to rebase back to the original commit and delete all modifications to the file in the commit to prevent the existence of commit records?

}

#endif // TPTR_DIALECT

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove edits from this file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as the previous conversation


#define GEN_PASS_DECL
#include "triton-shared/Conversion/TPtrToLLVM/Passes.h.inc"
void populateTPtrToLLVMConversionPatterns(RewritePatternSet &patterns,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) Put a new line between the include and function declaration.

Comment on lines 10 to 14
MLIRIR
MLIRPass
MLIRTransforms
MLIRSupport
MLIRReconcileUnrealizedCasts
TPtrIR
MLIRDialectUtils
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) alphabetical order

mlir::triton::TritonDialect,
mlir::cf::ControlFlowDialect, mlir::scf::SCFDialect,
mlir::math::MathDialect, mlir::arith::ArithDialect,
// mlir::gpu::GPUDialect,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please run clang-format

@Realtyxxx
Copy link
Author

Sorry, I will start making changes based on your suggestion now

Please do not resolve my comments. I will resolve them as I go through a re-review.

@Dasor
Copy link

Dasor commented Sep 2, 2025

Hi @Realtyxxx and reviewer @red1bluelost just wanted to add some things I found about this PR that may prove useful.

First of all thanks for the PR as It has come in handy for a downstream implementation I'm doing. However, I found a big error that makes some programs crash.

@Realtyxxx at TPtrToLLVM.cpp you have three rewrites that don't seem quite right, those are MemRefAllocConverter, MemRefLoadConverter and MemRefStoreConverter.

First of all it doesn't seem right that if you are lowering tptr to llvm you are targeting ptr in those rewrites. I get that ptr lowering may still be required, but it’s a bit surprising in the context of a tptrllvm pass.

But the real big mistake is on MemRefAllocConverter in this part:

    // For now, use alloca instead of malloc to avoid complex call setup
    Value totalSize = rewriter.create<LLVM::ConstantOp>(
        loc, i64Ty, rewriter.getIntegerAttr(i64Ty, totalElements));

    Value allocatedPtr = rewriter.create<LLVM::AllocaOp>(
        loc, ptrTy, ptrTy, totalSize, /*alignment=*/0);

What is being done here is incorrect as you are rewriting a memref.allocOp into an llvm.alloca, this is a big issue because memref.allocOp use heap memory but llvm.alloca uses stack memory, on the best case this can just lead to a memory leak but if deallocation passes are used in the pipeline (as they should) then the program will crash with an invalid free as it tries to free stack memory.

What I suggest is to try and add some TypeConversions in your populateTPtrToLLVMConversionPatterns as done in this LLVM upstream PR something like this but adapted to this code:

  // Add address space conversions.
  converter.addTypeAttributeConversion(
      [&](PtrLikeTypeInterface type, ptr::GenericSpaceAttr memorySpace)
          -> TypeConverter::AttributeConversionResult {
        if (type.getMemorySpace() != memorySpace)
          return TypeConverter::AttributeConversionResult::na();
        return IntegerAttr::get(IntegerType::get(type.getContext(), 32), 0);
      });

  // Add type conversions.
  converter.addConversion([&](ptr::PtrType type) -> Type {
    std::optional<Attribute> maybeAttr =
        converter.convertTypeAttribute(type, type.getMemorySpace());
    auto memSpace =
        maybeAttr ? dyn_cast_or_null<IntegerAttr>(*maybeAttr) : IntegerAttr();
    if (!memSpace)
      return {};
    return LLVM::LLVMPointerType::get(type.getContext(),
                                      memSpace.getValue().getSExtValue());
  });

Or just try and use malloc but I think just doing the TypeConverter should be easier and cleaner.

I haven't tried to do it as I also work with a downstream LLVM that already has this. Anyways, I hope this comment can help!

@Realtyxxx
Copy link
Author

Hi @Realtyxxx and reviewer @red1bluelost just wanted to add some things I found about this PR that may prove useful.

First of all thanks for the PR as It has come in handy for a downstream implementation I'm doing. However, I found a big error that makes some programs crash.

@Realtyxxx at TPtrToLLVM.cpp you have three rewrites that don't seem quite right, those are MemRefAllocConverter, MemRefLoadConverter and MemRefStoreConverter.

First of all it doesn't seem right that if you are lowering tptr to llvm you are targeting ptr in those rewrites. I get that ptr lowering may still be required, but it’s a bit surprising in the context of a tptrllvm pass.

But the real big mistake is on MemRefAllocConverter in this part:

    // For now, use alloca instead of malloc to avoid complex call setup
    Value totalSize = rewriter.create<LLVM::ConstantOp>(
        loc, i64Ty, rewriter.getIntegerAttr(i64Ty, totalElements));

    Value allocatedPtr = rewriter.create<LLVM::AllocaOp>(
        loc, ptrTy, ptrTy, totalSize, /*alignment=*/0);

What is being done here is incorrect as you are rewriting a memref.allocOp into an llvm.alloca, this is a big issue because memref.allocOp use heap memory but llvm.alloca uses stack memory, on the best case this can just lead to a memory leak but if deallocation passes are used in the pipeline (as they should) then the program will crash with an invalid free as it tries to free stack memory.

What I suggest is to try and add some TypeConversions in your populateTPtrToLLVMConversionPatterns as done in this LLVM upstream PR something like this but adapted to this code:

  // Add address space conversions.
  converter.addTypeAttributeConversion(
      [&](PtrLikeTypeInterface type, ptr::GenericSpaceAttr memorySpace)
          -> TypeConverter::AttributeConversionResult {
        if (type.getMemorySpace() != memorySpace)
          return TypeConverter::AttributeConversionResult::na();
        return IntegerAttr::get(IntegerType::get(type.getContext(), 32), 0);
      });

  // Add type conversions.
  converter.addConversion([&](ptr::PtrType type) -> Type {
    std::optional<Attribute> maybeAttr =
        converter.convertTypeAttribute(type, type.getMemorySpace());
    auto memSpace =
        maybeAttr ? dyn_cast_or_null<IntegerAttr>(*maybeAttr) : IntegerAttr();
    if (!memSpace)
      return {};
    return LLVM::LLVMPointerType::get(type.getContext(),
                                      memSpace.getValue().getSExtValue());
  });

Or just try and use malloc but I think just doing the TypeConverter should be easier and cleaner.

I haven't tried to do it as I also work with a downstream LLVM that already has this. Anyways, I hope this comment can help!

Thank you very much for your opinion, I have started to work on the research

@Realtyxxx
Copy link
Author

Realtyxxx commented Sep 3, 2025

@Dasor

  1. memref.alloc should be converted to malloc, which is the way it should, because the pass that was converted to memref before did not construct free, so there was no error; This has been modified, by the way, @red1bluelost Is there any plan for this memory leak?
  2. The memoryspace in PtrType may be introduced in the future, using the conversion method you recommended, and minor modifications will also require conversion when there is no memSpace.
  3. Can you give more specific advice about MemRefLoadConverter and MemRefStoreConverter.

@Dasor
Copy link

Dasor commented Sep 3, 2025

@Dasor

  1. memref.alloc should be converted to malloc, which is the way it should, because the pass that was converted to memref before did not construct free, so there was no error; This has been modified, by the way, @red1bluelost Is there any plan for this memory leak?
  2. The memoryspace in PtrType may be introduced in the future, using the conversion method you recommended, and minor modifications will also require conversion when there is no memSpace.
  3. Can you give more specific advice about MemRefLoadConverter and MemRefStoreConverter.
  1. Yes, there will be a memory leak if there is no dealloc pass (currently there is not such pass in the CPU pipeline of triton shared) but there should be (just add the --buffer-deallocation-pipeline pass) and in that case it produces a crash.
  2. You can try and do the conversion from the Tptr memoryspace instead.
  3. If you implement the type conversion you don't not need MemRefLoadConverter and MemRefStoreConverter (not even MemRefAllocConverter) as LLVM will take care of it automatically using the TypeConverter when it sees a memref of type ptr

@Realtyxxx
Copy link
Author

Realtyxxx commented Sep 3, 2025

3. as LLVM will take care of it automatically using the TypeConverter when it sees a memref of type ptr

I understand what you mean, I'll give it a try, the reason I thought about this before was because I ran into the form memref<64x32x!ptr.ptr<#tptr.default_memory_space>> 之所以在这里加入了这三个Op的修改也是因为这个原因

@Realtyxxx
Copy link
Author

The branch has been rebased and squashed. Please take another look. @red1bluelost @Dasor

@Dasor
Copy link

Dasor commented Sep 4, 2025

The malloc looks good, however, there is no conversion for memref.dealloc still I think all the memrefs conversion patterns can be avoided by the TypeConverter but I see yours seems to not be doing anything (as it should be converting dealloc) I think it's because rather than using your own TypeConverter you need to use LLVMTypeConverter from lib/Conversion/TPtrToLLVM/TPtrToLLVMPass.cpp and add the conversion there but I'm not 100% sure.

you can use this example to test and see if tptr-to-llvm is working correctly:

func.func @kernel(%ptr: !ptr.ptr<#ptr.generic_space>) {
      %c0 = arith.constant 0 : index
      %c1 = arith.constant 1 : index
      %c128 = arith.constant 128 : index
      %alloc_4 = memref.alloc() {alignment = 64 : i64} : memref<128x!ptr.ptr<#ptr.generic_space>>
      scf.for %arg9 = %c0 to %c128 step %c1 {
        memref.store %ptr, %alloc_4[%arg9] : memref<128x!ptr.ptr<#ptr.generic_space>>
      }
      memref.dealloc %alloc_4 : memref<128x!ptr.ptr<#ptr.generic_space>>
      return
    }
}

@Realtyxxx
Copy link
Author

The malloc looks good, however, there is no conversion for memref.dealloc still I think all the memrefs conversion patterns can be avoided by the TypeConverter but I see yours seems to not be doing anything (as it should be converting dealloc) I think it's because rather than using your own TypeConverter you need to use LLVMTypeConverter from lib/Conversion/TPtrToLLVM/TPtrToLLVMPass.cpp and add the conversion there but I'm not 100% sure.malloc 看起来不错,但是仍然没有 memref.dealloc 的转换。我认为所有的 memrefs 转换模式都可以通过 TypeConverter 避免,但我看到你的似乎什么都没做(它应该转换 dealloc)。我想是因为你不需要使用自己的 TypeConverter,而应该使用来自 lib/Conversion/TPtrToLLVM/TPtrToLLVMPass.cpp 的 LLVMTypeConverter,并在那里添加转换,但我不是 100%确定。

you can use this example to test and see if tptr-to-llvm is working correctly:你可以使用这个示例来测试并查看 tptr-to-llvm 是否正常工作:

func.func @kernel(%ptr: !ptr.ptr<#ptr.generic_space>) {
      %c0 = arith.constant 0 : index
      %c1 = arith.constant 1 : index
      %c128 = arith.constant 128 : index
      %alloc_4 = memref.alloc() {alignment = 64 : i64} : memref<128x!ptr.ptr<#ptr.generic_space>>
      scf.for %arg9 = %c0 to %c128 step %c1 {
        memref.store %ptr, %alloc_4[%arg9] : memref<128x!ptr.ptr<#ptr.generic_space>>
      }
      memref.dealloc %alloc_4 : memref<128x!ptr.ptr<#ptr.generic_space>>
      return
    }
}

using LLVMTypeConverter didn't change anything, ptr-to-llvm is handled along with other transformations in the convert-to-llvm pass, where it doesn't handle memref situations all at once

@Dasor
Copy link

Dasor commented Sep 4, 2025

Well I think that's the point if LLVM already handles that with convert-to-llvm there is no need to redo the work. We just need to add that pass to the pass pipeline and let tptr-to-llvm handle only the tptr dialect without changing anything related to the ptr dialect.

I think once #331 gets merged and #329, #327 gets done the only part of TPtr we are going to need is IntToPtr and PtrToInt

@Realtyxxx
Copy link
Author

Well I think that's the point if LLVM already handles that with convert-to-llvm there is no need to redo the work. We just need to add that pass to the pass pipeline and let tptr-to-llvm handle only the tptr dialect without changing anything related to the ptr dialect.

I think once #331 gets merged and #329, #327 gets done the only part of TPtr we are going to need is IntToPtr and PtrToInt

I have a similar opinion. In the future, the tptr dialect in triton-shared may be replaced by the improved ptr dialect. It seems that my changes will be irrelevant.

@Realtyxxx Realtyxxx closed this Sep 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants